--- layout: teaching title: Web Mapping and Analysis permalink: training/WebMapping/Practical2 ---
Packages within R enable us to extend the functionality of “base R”. New functionality can be added by loading “packages” which are blocks of code containing new functions. These can be written by anyone, and shared on either CRAN (The Comprehensive R Archive Network) or increasingly, directly from the package developers using code sharing platforms such as github.
For this practical we will be using two packages: rgdal which contains functions that will read a shapefile into R, and ggplot2 which is very popular package for visualization.
The first stage is to install the packages. After this has complete, you then need load them. Package loading is required for every new R session.
#Installing packages
install.packages("rgdal", depend = TRUE)
install.packages("ggplot2", depend = TRUE)
install.packages("classInt", depend = TRUE)
install.packages("RColorBrewer", depend = TRUE)
install.packages("maptools", depend = TRUE)
#Load packages
library("rgdal")
## Loading required package: sp
## rgdal: version: 1.0-4, (SVN revision 548)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 1.11.2, released 2015/02/10
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rgdal/gdal
## Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.2/Resources/library/rgdal/proj
## Linking to sp version: 1.1-1
library("ggplot2")
library("RColorBrewer")
library("maptools")
## Checking rgeos availability: TRUE
library("classInt")
We are going to use the data created in the last practical, so copy the “census_small.csv” into the folder you are using for this new pratical.
First read in the data we will used for the practical.
census <- read.csv("census_small.csv")
This should look as follows…
head(census)
## X Code Ward PCT_Good_Health
## 1 1 E05000886 Allerton and Hunts Cross 48.97327
## 2 2 E05000887 Anfield 42.20538
## 3 3 E05000888 Belle Vale 40.84911
## 4 4 E05000889 Central 58.62832
## 5 5 E05000890 Childwall 51.90538
## 6 6 E05000891 Church 53.39201
## PCT_Higher_Managerial target PCT_Social_Rented_Households
## 1 10.091491 No 13.005189
## 2 2.912621 Yes 22.772576
## 3 3.931920 Yes 42.555119
## 4 7.019923 No 18.363917
## 5 10.787704 No 6.937488
## 6 17.437790 No 3.025153
The ggplot2 library provides a range of functions that make graphing and visualization of your data both visually appealing and simple to implement. There are two ways in which graphs can be created in ggplot2, the first is ggplot() which we will discuss later, and the second is qlot(), which has a simplified syntax.
We can first create a bar chart using the factor column (“target”) of the data frame object “census”. The “geom” attribute is telling qplot what sort of plot to make. If you remember from the last practical, the target variable were wards within Liverpool where the percentage of people in good health was less than the city mean.
qplot(target, data=census, geom="bar")
We can create a histogram by changing the “geom” and variable being plotted. Try adjusting the bin width, which alters the bins into which the values of the “PCT_Social_Rented_Households” column are aggregated.
qplot(PCT_Social_Rented_Households, data=census, geom="histogram",binwidth=10)
Another very common type of graph is a scatterplot which will typically plot the values of two continuous variables against one another on the x and y axis of the graph. This graph looks at the relationship between the percentage of people in socially rented housing, and those who are occupied in higher managerial roles. The default plot type is a scatterplot, so note in the next couple of examples we do not include geom = "point", however, this could be added and would return the same result (try it!)
qplot(PCT_Social_Rented_Households, PCT_Higher_Managerial, data = census)
In the previous graph, all the points were black, however, if we swap these out for colour, we can highlight a factor variable, which in this case is the “target” column.
qplot(PCT_Social_Rented_Households, PCT_Higher_Managerial, data = census,colour=target)
Alternatively, you can also use “shape” to keep the points as black, but alter their shape by the factor variable.
qplot(PCT_Social_Rented_Households, PCT_Higher_Managerial, data = census,shape=target)
If we want to add a trend line to the plot this is also possible by adding an addition parameter to the “geom”.
qplot(PCT_Social_Rented_Households, PCT_Higher_Managerial, data = census,geom = c("point","smooth"))
## geom_smooth: method="auto" and size of largest group is <1000, so using loess. Use 'method = x' to change the smoothing method.
We might also want a simpler linear regression line; which requires two further parameters including “method”" and “formula”.
qplot(PCT_Social_Rented_Households, PCT_Higher_Managerial, data = census,geom = c("point","smooth"),method="lm", formula=y~x)
To illustrate how to create line plots we will read in some economic data downloaded from the Office for National Statistics which concerns household expenditure since 1948.
household_ex <- read.csv("expenditure.csv")
We can then have a quick look at the data and check on the data class.
head(household_ex)
## Year Millions
## 1 1948 191274
## 2 1949 194639
## 3 1950 200097
## 4 1951 197686
## 5 1952 197993
## 6 1953 206868
lapply(household_ex,class)
## $Year
## [1] "integer"
##
## $Millions
## [1] "integer"
We can now attempt to plot the data.
qplot(Year, Millions, data = household_ex, geom = "line")
On the y axis, ggplot2 has defaulted to using scientific notation. We can change this, however, we will swap to the main ggplot syntax in order to do this. The first stage is to setup the plot, telling ggplot what data to use, and which “aesthetic mappings” (variables!) will be passed to the plotting function. In fact aes() is a function, however never used outside of ggplot(). This is stored in a variable “p”
p <- ggplot(household_ex, aes(Year, Millions))
If you just typed “p” into the terminal this would return an error as you still need to tell ggplot() which type of graphical output is desired. We do this by adding additional parameters using the “+” symbol.
p + geom_line()
Swapping out the scientific notation requires another library called “scales”. Once loaded, we can then add an additional parameter onto the graph.
library(scales)
p + geom_line() + scale_y_continuous(labels = comma)
We can also change the x and y axis labels
library(scales)
p + geom_line() + scale_y_continuous(labels = comma) + labs(x="Years", y="Millions (£)")
Before we can create a map in R, we first need to import some spatial data. We will read in two shapefiles for this example, the first containing polygons that will later be used to create a choropleth map, and the second some street segments that will be used to provide context.
We can import a shapefile into R using the readOGR() function. In the example, we add the name of the shapefile (without the extension) as the second parameter, and add “.” to the first which signifies that R should look for the file within the current working directory.
#Read polygons (creates a SpatialPolygonsDataFrame object)
LSOA <- readOGR(".", "E06000042")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "E06000042"
## with 152 features
## It has 12 fields
#Read lines (creates a SpatialLinesDataFrame object)
roads <- readOGR(".", "Road")
## OGR data source with driver: ESRI Shapefile
## Source: ".", layer: "Road"
## with 12813 features
## It has 1 fields
A Spatial Data Frame object stores a range of attributes derived form the shapefile including the geometry of the features (e.g. polygon shape and location), the attributes for each feature (previously stored in the .dbf) and the projection.
We require a slightly different syntax to view the data contained within the spatial data frame object, and use the “@” symbol. The data is stored in a “slot” called “data”.
head(LSOA@data)
## LSOA11CD imd_rank imd_score income employment education health crime
## 0 E01016710 11648 23.67 0.20 0.14 28.57 0.47 0.36
## 1 E01016711 25604 8.87 0.10 0.07 10.62 -0.60 -0.76
## 2 E01016713 22671 11.27 0.09 0.09 18.38 -0.42 -0.57
## 3 E01016714 8671 29.07 0.24 0.15 37.17 0.44 0.29
## 4 E01016715 10843 25.04 0.17 0.14 31.33 0.33 0.79
## 5 E01016716 11207 24.39 0.19 0.14 24.94 0.57 0.35
## housing living_env idaci idaopi
## 0 17.34 2.35 0.25 0.23
## 1 19.87 2.31 0.17 0.07
## 2 18.40 7.72 0.14 0.08
## 3 22.76 19.28 0.31 0.22
## 4 25.15 3.82 0.22 0.16
## 5 18.58 14.52 0.25 0.16
All the available slots can be viewed as follows
slotNames(LSOA)
## [1] "data" "polygons" "plotOrder" "bbox" "proj4string"
We can use the plot() function which is built into base R to show the outlines of the polygons contained within the “LSOA” object.
plot(LSOA)
This map shows the Lower Layer Super Output Area boundaries for the local authority district of Milton Keynes. The attributes of the data frame are the overall and domain scores for the 2015 Index of Multiple Deprivation.
We will shade in this map using the overall IMD score which is stored in the column “imd_rank”. There are a total of 152 values.
LSOA@data$imd_rank
## [1] 11648 25604 22671 8671 10843 11207 12012 3373 20067 7551 9137
## [12] 22727 25119 12911 11061 6095 10851 20505 6437 15455 18551 7911
## [23] 22588 27851 31225 16778 8497 22734 20954 6565 752 2839 2770
## [34] 18299 9368 23993 29156 24245 21270 18728 16220 24263 14368 24839
## [45] 25995 23522 25124 29853 14061 24259 22557 13307 13909 31203 14125
## [56] 27752 21883 15621 11812 27468 25571 24412 31672 21115 22891 24975
## [67] 32370 22815 11324 27378 15427 27970 24601 28917 27549 30837 28489
## [78] 25950 24948 31850 25107 32657 27560 28558 25444 21660 17228 7774
## [89] 24838 4724 16243 25828 12185 2721 2055 21149 17150 27271 25171
## [100] 30119 19675 21053 5817 20887 20572 13906 23128 20255 20793 32082
## [111] 21992 28958 10631 23268 21406 18398 27979 9328 5327 15527 12061
## [122] 13869 3727 21063 15044 825 1058 771 1544 9739 4201 4959
## [133] 8465 6155 5279 19975 24541 27256 17521 25346 29805 20147 21800
## [144] 28875 27058 27277 30067 25308 28683 15305 20663 17546
The first step is to find suitable breakpoints for the data contained in the imd_rank column. The contiuous data needs to be assigned into categories so different colours can be applied on a choropleth map. There are numerous ways of doing this such as jenks, standard deviations or equal intervals. In this example we use a new function classIntervals() from the “classInt” package to find the ranges needed to divide the the imd_rank into five categories. In this example we use the style “fisher” to specify jenks as the break point.
breaks <- classIntervals(LSOA@data$imd_rank, n = 5, style = "fisher")
If we print the object created by the classIntervals() function, a summary is printed showing you what breaks have been assigned, and how many areas are within these ranges.
breaks
## style: fisher
## one of 20,811,575 possible partitions of this variable into 5 classes
## [752,7058) [7058,13109) [13109,19201.5) [19201.5,26526.5)
## 21 22 24 54
## [26526.5,32657]
## 31
We need to choose some colours which we will assign to each of the break points in the data. We will now use another package called “RColorBrewer” which provides a series of colour pallets that are suitable for mapping. You can have a look at the colour pallets online: http://colorbrewer2.org. Each of these pallets are named; and you can see all the available pallets as follows…
display.brewer.all()
We will then choose six colours from the pallet “YlOrRd”, and print them to the terminal. You will see that the colours are stored as hex values.
my_colours <- brewer.pal(6, "YlOrRd")
my_colours
## [1] "#FFFFB2" "#FED976" "#FEB24C" "#FD8D3C" "#F03B20" "#BD0026"
We can then use the function findColours() to select the appropriate colour for each of the numbers we intend to map, depending on where these fit within the break points we calculated.
colours_to_map <- findColours(breaks, my_colours)
We can then create a basic map using this list of colours and the plot() function again.
plot(LSOA,col=colours_to_map)
We might also want to create a map without the borders, and can be controlled with an additional parameter which is set to “NA”
plot(LSOA,col=colours_to_map,border = NA)
We can also add additional layers onto the map using a further parameter (“add”) which is set to “TRUE”. Without the “add=TRUE”, every time plot() is called, the previous plot is replaced. Two further parameters are used, “col” to specify the line colour, and “lwd” the line width.
plot(LSOA,col=colours_to_map,border = NA)
plot(roads,add=TRUE, col="#6B6B6B",lwd=0.3)